Describe data here.
The following packages are needed for this report
library(tidyverse)
library(limma)
library(QFeatures)
library(msqrob2)Initialize all input parameters used in the GUI.
featuresFile <- "featuresFile.txt"
annotationFile <- "annotationFile.xlsx"Specification of feature file and annotation file. Note that the feature file and the annotation file are copied to the zip file for the report and are named featuresFile.txt and annotationFile.xlsx, respectively.
groupBy <- "Proteins"
logTrans <- TRUE
removeRazor <- TRUE
filterColumns <- c("Reverse", "Potential.contaminant")
minObsFeat <- 2L
normMethod <- "center.median"sumMethod <- "robust"The features are summarized according to the variable groupBy=Proteins using summarization method: sumMethod=robust.
form <- "~treatment*time"
doRidge <- FALSEThe data are modelled using the following formula: form=~treatment*time
The parameters are estimated using robust ridge regression if the argument doRidge equals TRUE, otherwise robust regresion without ridge penalization. Here, doRidge=FALSE
contrast <- "treatmentLN+treatmentLN:time2w=0"
sigLevel <- 0.05maxPlot <- 10LIn the report the maxPlot=10 most significant features will be plotted in detail-plots.
We first import the data from FeaturesFile.txt file. This is the file containing your raw Feature-level intensities.
To import the data we use the QFeatures package.
ecols <- grep("Intensity\\.", names(read.delim(featuresFile)))
pe <- readQFeatures(
table = featuresFile, fnames = 1, ecol = ecols,
name = "rawFeatures", sep = "\t"
)
currentAssay <- "rawFeatures"
pe## An instance of class QFeatures containing 1 assays:
## [1] rawFeatures: SummarizedExperiment with 30575 rows and 30 columns
In the following code chunk, we set-up the colData with the information on the design.
exp_annotation <- as.data.frame(unclass(openxlsx::read.xlsx(annotationFile)))
runName <- which(
vapply(exp_annotation, function(x) return(
identical(
sort(
as.character(
colnames(pe[[1]]))
),
sort(
as.character(x))
)
), FUN.VALUE = TRUE))
if (length(runName)>0) runName <- runName[1]
rownames(exp_annotation) <- as.character(
exp_annotation[,runName]
)
exp_annotation <- exp_annotation[colnames(pe[[1]]),]
for (j in colnames(exp_annotation))
colData(pe)[[j]] <- exp_annotation[,j]This section preforms preprocessing for the feature. This include
We calculate how many non zero intensities we have per feature and this will be useful for filtering.
rowData(pe[[currentAssay]])$nNonZero <- rowSums(assay(pe[[currentAssay]]) > 0)features with zero intensities are missing features and should be represent with a NA value rather than 0.
pe <- zeroIsNA(pe, currentAssay) # convert 0 to NA52% of all feature intensities are missing and for some features we do not even measure a signal in any sample.
If the variable logTrans is TRUE than the logtransformation is performed.
logTrans## [1] TRUE
if(logTrans) {
pe <- logTransform(pe, base = 2, i = currentAssay, name = "logFeatures")
currentAssay <- "logFeatures"
} In maxQuant output a peptide can map to multiple proteins, as long as there is none of these proteins present in a smaller subgroup.
if removeRazor is TRUE. Razor features are removed.
removeRazor## [1] TRUE
if (removeRazor) pe[[currentAssay]] <-
pe[[currentAssay]][rowData(pe[[currentAssay]])[,groupBy]
%in% smallestUniqueGroups(rowData(pe[[currentAssay]])[,groupBy]),]The variable filterColumns contains the columns that will be used to filter the data. This currently only works for Maxquant input. For Maxquant values that are indicated with a “+” are filtered, e.g. Decoys (in the column named Reverse) and contaminants (in the columns named Contaminants or Potential.contaminants).
filterColumns## [1] "Reverse" "Potential.contaminant"
if (length(filterColumns)>0)
for (j in filterColumns)
{
rowData(pe[[currentAssay]])[is.na(rowData(pe[[currentAssay]])[,j]),j] <-""
pe[[currentAssay]] <- pe[[currentAssay]][rowData(pe[[currentAssay]])[,j] != "+", ]
}minObsFeat## [1] 2
pe[[currentAssay]] <- pe[[currentAssay]][rowData(pe[[currentAssay]])$nNonZero >= minObsFeat, ]
nrow(pe[[currentAssay]])## [1] 23690
We keep 23690 features upon filtering.
The data are normalized using the center.median method (If normMethod is “none” no normalisation is performed).
normMethod## [1] "center.median"
if (normMethod != "none")
{
pe <- normalize(pe,
i = currentAssay,
name = "normFeatures",
method = normMethod)
currentAssay <- "normFeatures"
} pe[[currentAssay]] %>%
assay %>%
as.data.frame() %>%
gather(sample, intensity) %>%
ggplot(aes(x = intensity, group = sample, color = sample)) +
geom_density()## Warning: Removed 315064 rows containing non-finite values (stat_density).
We can visualize our data using a Multi Dimensional Scaling plot, eg. as provided by the limma package.
pe[[currentAssay]] %>%
assay %>%
limma::plotMDS() The first axis in the plot is showing the leading log fold changes (differences on the log scale) between the samples.
The data are summarized using the summarization method: robust (if sumMethod is “none” no summarization is performed).
sumMethod## [1] "robust"
if (sumMethod!="none")
{
if (sumMethod=="robust") fun <- MsCoreUtils::robustSummary
if (sumMethod=="sum") fun <- base::colSums
if (sumMethod=="mean") fun <- base::colMeans
if (sumMethod=="median") fun <- matrixStats::colMedians
if (sumMethod=="medpolish") fun <- MsCoreUtils::medianPolish
pe <- aggregateFeatures(pe,
i = currentAssay,
fcol = groupBy,
na.rm = TRUE,
name = "sumFeatures",
fun = fun
)
currentAssay <- "sumFeatures"
}## Your quantitative and row data contain missing values. Please read the
## relevant section(s) in the aggregateFeatures manual page regarding the
## effects of missing values on data aggregation.
An MDS plot of the summarized features can be found below.
plotMDS(assay(pe[[currentAssay]]))We model the summarized feature level expression values using msqrob. By default msqrob2 estimates the model parameters using robust regression.
We will model the data using following formula
as.formula(form)## ~treatment * time
pe <- msqrob(object = pe, i = currentAssay, formula = as.formula(form), ridge = doRidge)We can also explore the design of the model that we specified using the the package ExploreModelMatrix
library(ExploreModelMatrix)
visDesign <- VisualizeDesign(colData(pe),form)
visDesign$plotlist## [[1]]
The following contrast will be assessed for each summarized feature:
contrast## [1] "treatmentLN+treatmentLN:time2w=0"
L <- makeContrast(
contrast,
parameterNames = colnames(visDesign[[3]])
)
pe <- hypothesisTest(object = pe, i = currentAssay, contrast = L)The following features are significant at the 0.05 FDR-level.
rowData(pe[[currentAssay]])[,colnames(L)] %>%
arrange(pval) %>%
filter(adjPval < sigLevel) %>%
DT::datatable() %>%
DT::formatSignif(columns = 1:6,digits=3)The FDR is controlled at the 0.05.
sigLevel## [1] 0.05
volcano <- ggplot(rowData(pe[[currentAssay]])[, colnames(L)],
aes(x = logFC, y = -log10(pval), color = adjPval < sigLevel)) +
geom_point(cex = 2.5) +
scale_color_manual(values = alpha(c("black", "red"), 0.5)) + theme_minimal()
volcanoWe first select the names of the summarized features that were declared signficant at the FDR-level of 0.05. If we could reject the null hypothesis related to the specified contrast for more than 1 summarized feature a heatmap will be made for the significant features.
sigLevel## [1] 0.05
sigNames <- rowData(pe[[currentAssay]])[,colnames(L)] %>%
rownames_to_column("feature") %>%
arrange(pval) %>%
filter(adjPval<sigLevel) %>%
pull(feature)
if (length(sigNames) >1) heatmap(assay(pe[[currentAssay]])[sigNames, ]) else cat("No plots are generated because there are no significant summarized features at the", sigLevel, "FDR level")We first extract the normalized rawFeatures expression values for a particular summarized feature. You selected maxPlot=10 so detail plots are constructed for the 10 most significant summarized features that are DA at the specified FDR level of 0.05. Note, that you can increase maxPlot to generate more plots.
maxPlot## [1] 10
if (length(sigNames) > maxPlot)
plotNames <- sigNames[1:maxPlot] else
plotNames <- sigNames
if ("normFeatures" %in% names(pe) & "sumFeatures" %in% names(pe) & length(plotNames) >= 1) for (protName in plotNames)
{
pePlot <- pe[protName, , c("normFeatures","sumFeatures")]
pePlotDf <- data.frame(longFormat(pePlot))
pePlotDf$assay <- factor(pePlotDf$assay,
levels = c("normFeatures","sumFeatures"))
# plotting
p1 <- ggplot(
data = pePlotDf,
aes(x = colname, y = value, group = rowname)
) +
geom_line() +
geom_point() +
theme(axis.text.x = element_text(angle = 70, hjust = 1, vjust = 0.5)) +
facet_grid(~assay) +
ggtitle(protName)
print(p1)
# plotting 2
p2 <- ggplot(
pePlotDf,
aes(x = colname, y = value)) +
geom_boxplot(outlier.shape = NA) +
geom_point(
position = position_jitter(width = .1),
aes(shape = rowname)) +
scale_shape_manual(values = 1:nrow(pePlotDf)) +
labs(title = protName, x = "sample", y = "feature intensity (log2)") +
theme(axis.text.x = element_text(angle = 70, hjust = 1, vjust = 0.5)) +
facet_grid(~assay) +
ggtitle(protName)
print(p2)
} else cat("No plots are generated because there are no significant summarized features at the", sigLevel, "FDR level")With respect to reproducibility, it is highly recommended to include a session info in your script so that readers of your output can see your particular setup of R.
sessionInfo()## R version 4.1.1 (2021-08-10)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
##
## locale:
## [1] nl_BE.UTF-8/nl_BE.UTF-8/nl_BE.UTF-8/C/nl_BE.UTF-8/nl_BE.UTF-8
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] ExploreModelMatrix_1.4.0 msqrob2_1.0.0
## [3] QFeatures_1.2.0 MultiAssayExperiment_1.18.0
## [5] SummarizedExperiment_1.22.0 Biobase_2.52.0
## [7] GenomicRanges_1.44.0 GenomeInfoDb_1.28.4
## [9] IRanges_2.26.0 S4Vectors_0.30.0
## [11] BiocGenerics_0.38.0 MatrixGenerics_1.4.3
## [13] matrixStats_0.61.0 limma_3.48.3
## [15] forcats_0.5.1 stringr_1.4.0
## [17] dplyr_1.0.7 purrr_0.3.4
## [19] readr_2.0.1 tidyr_1.1.3
## [21] tibble_3.1.4 ggplot2_3.3.5
## [23] tidyverse_1.3.1
##
## loaded via a namespace (and not attached):
## [1] minqa_1.2.4 colorspace_2.0-2 ellipsis_0.3.2
## [4] XVector_0.32.0 fs_1.5.0 clue_0.3-59
## [7] rstudioapi_0.13 farver_2.1.0 DT_0.19
## [10] fansi_0.5.0 lubridate_1.7.10 xml2_1.3.2
## [13] codetools_0.2-18 splines_4.1.1 knitr_1.34
## [16] jsonlite_1.7.2 nloptr_1.2.2.2 broom_0.7.9
## [19] cluster_2.1.2 dbplyr_2.1.1 shinydashboard_0.7.1
## [22] shiny_1.7.0 compiler_4.1.1 httr_1.4.2
## [25] backports_1.2.1 assertthat_0.2.1 Matrix_1.3-4
## [28] fastmap_1.1.0 lazyeval_0.2.2 cli_3.0.1
## [31] later_1.3.0 htmltools_0.5.2 tools_4.1.1
## [34] gtable_0.3.0 glue_1.4.2 GenomeInfoDbData_1.2.6
## [37] Rcpp_1.0.7 cellranger_1.1.0 jquerylib_0.1.4
## [40] vctrs_0.3.8 nlme_3.1-153 crosstalk_1.1.1
## [43] rintrojs_0.3.0 xfun_0.26 openxlsx_4.2.4
## [46] lme4_1.1-27.1 rvest_1.0.1 mime_0.11
## [49] lifecycle_1.0.0 zlibbioc_1.38.0 MASS_7.3-54
## [52] scales_1.1.1 promises_1.2.0.1 hms_1.1.0
## [55] ProtGenerics_1.24.0 AnnotationFilter_1.16.0 yaml_2.2.1
## [58] sass_0.4.0 stringi_1.7.4 highr_0.9
## [61] boot_1.3-28 zip_2.2.0 BiocParallel_1.26.2
## [64] rlang_0.4.11 pkgconfig_2.0.3 bitops_1.0-7
## [67] evaluate_0.14 lattice_0.20-45 htmlwidgets_1.5.4
## [70] labeling_0.4.2 cowplot_1.1.1 tidyselect_1.1.1
## [73] magrittr_2.0.1 R6_2.5.1 generics_0.1.0
## [76] DelayedArray_0.18.0 DBI_1.1.1 pillar_1.6.2
## [79] haven_2.4.3 withr_2.4.2 MsCoreUtils_1.4.0
## [82] RCurl_1.98-1.5 modelr_0.1.8 crayon_1.4.1
## [85] utf8_1.2.2 tzdb_0.1.2 rmarkdown_2.11
## [88] grid_4.1.1 readxl_1.3.1 reprex_2.0.1
## [91] digest_0.6.28 xtable_1.8-4 httpuv_1.6.3
## [94] munsell_0.5.0 bslib_0.3.0 shinyjs_2.0.0